-
Notifications
You must be signed in to change notification settings - Fork 13.7k
CUDA & CPU: support F32 kernel type for CONV_TRANSPOSE_2D
#17094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…types and improve parameter handling - Introduced a `conv2d_transpose_params` struct for better parameter management. - Updated `conv2d_transpose_kernel` to be templated for different kernel types (float and half). - Modified `ggml_cuda_conv_2d_transpose_p0` to handle both F16 and F32 kernel types. - Enhanced test cases to validate functionality for both kernel types.
…ernel types - Updated `test_conv_transpose_2d` structure to improve parameter handling by reordering constructor arguments. - Enhanced test case generation to iterate over kernel types, allowing for flexible testing of different configurations. - Removed hardcoded kernel type instances in favor of a loop for better maintainability and scalability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR make a difference to something? From what I understand, the kernel value is upcast into float before doing any accumulation (and accumulation is anyway in f32). So unless there are kernels around which don't fit into f16 I don't see a benefit to supporting this, especially when we don't support the f16 inputs yet (which incidentally might be more relevant than kernels being f32 as we could potentially do half2 multiplications)
So the motivations of this PR are:
|
That's because it matches the CPU capabilities exactly
That would be a problem in a conversion to GGUF, not necessarily a problem to be solved here. |
|
You should add the CPU version for the f32 kernel too, that way this PR makes more sense |
…nd F32 tensor types.
…nhancing flexibility for different data types. Update test cases to include both F16 and F32 tensor types for comprehensive coverage.
CONV_TRANSPOSE_2DCONV_TRANSPOSE_2D
Co-authored-by: Aman Gupta <[email protected]>
Co-authored-by: Aman Gupta <[email protected]>
881538e to
175708c
Compare
…pose_params struct and dispatching with direct kernel launch.
…d kernel type for improved flexibility with F16 and F32 data types.
|
Hi @am17an, thanks for reviewing this PR. Here’s what has been updated:
Please let me know if there’s anything else you’d like changed. |
am17an
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with the CUDA changes. Either @ggerganov or @slaren has to approve the ggml-cpu changes.
also updated test case in
test-backend-ops.But since F32 kernel type is not supported on CPU, only
GGML_TYPE_F16is kept andGGML_TYPE_F32can be uncommented back in the future.